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Abstract 

We develop a method for measuring and localizing homology classes. 
This involves two problems. First, we define relevant notions of size for 
both a homology class and a homology group basis, using ideas from rela- 
tive homology. Second, we propose an algorithm to compute the optimal 
homology basis, using techniques from persistent homology and finite field 
algebra. Classes of the computed optimal basis arc localized with cycles 
conveying their sizes. The algorithm runs in 0(/9''n^ log^ n) time, where 
n is the size of the simplicial complex and (3 is the Betti number of the 
homology group. 

1 Introduction 

In recent years, the problem of computing the topological features of a space 
has drawn much attention. There are two reasons for this. The first is a general 
observation: compared with geometric features, topological features are more 
qualitative and global, and tend to be more robust. If the goal is to charac- 
terize a space, therefore, features which incorporate topology seem to be good 
candidates. 

The second reason is that topology plays an important role in a number of 
applications. Researchers in graphics need topological information to facilitate 
parameterization of surfaces and texture mapping [13, 4]. In the field of sensor 
networks, the use of homological tools is crucial for certain coverage problems 
[10]. Computational biologists use topology to study protein docking and folding 
problems [1, 8]. Finally, topological features are especially important in high 
dimensional data analysis, where purely geometric tools are often deficient, and 
full-blown space reconstruction is expensive and often ill-posed [3, 16]. 

Once we are able to compute topological features, a natural problem is to 
rank the features according to their importance. The significance of this problem 



can be justified from two perspectives. First, unavoidable errors are introduced 
in data acquisition, in the form of traditional signal noise, and finite sampling 
of continuous spaces. These errors may lead to the presence of many small 
topological features that are not "real" , but are simply artifacts of noise or of 
sampling [21]. Second, many problems are naturally hierarchical. This hierarchy 
- which is a kind of multiscale or multi-resolution decomposition - implies that 
we want to capture the large scale features first. See Figure 1 for examples. 



Figure 1: A disk with three holes and a 2-handled torus are really more like an 
annulus and a 1-handled torus, respectively, because the large features are more 
important. 

There are a variety of ways of characterizing topological spaces in the litera- 
ture, including fundamental groups, homology groups, and the Euler character- 
istic. In this paper, we concentrate on homology groups as they are relatively 
straightforward to compute in general dimension, and provide a decent amount 
of information (more, say, than a coarse measure like the Euler characteristic). 

Ranking the homology classes according to their importance involves the 
following three subproblems. 

1. Measuring the size of a homology class: We need a way to quantify 

the size of a given homology class, and this size measure should agree with 
intuition. For example, in Figure 2 (center), the measure should be able 
to distinguish the one large class (of the 1-dimensional homology group) 
from the two smaller classes. Furthermore, the measure should be easy to 
compute, and applicable to homology groups of any dimension. 

2. Localizing a homology clciss: Given the size measure for a homology 
class, we would like to find a representative cycle from this class which, in 
a precise sense, has this size. For example, in Figure 2 (center), the c;ycles 
zi and Z2 are well-localized representatives of their respective homology 
classes; whereas Z3 is not. 

3. Choosing a basis for a homology group: We would like to choose a 
"good" set of homology classes to be the generators for the homology group 
(of a fixed dimension). Suppose that (3 is the dimension of this group, and 
that we are using Z2 coefficients; then there are 2'^ — 1 nontrivial homology 
classes in total. For a basis, we need to choose a subset of /? of these classes, 
subject to the constraint that these /3 generate the group. The criterion 
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Figure 2: A disk with three holes. Left: the underlying topological space. 
Center: cycles z\ and Z2 convey the size of their respective homology classes; 23 
does not. Right: geodesic balls measuring the 1-dimensional homology classes 
(used in Section 3.2). 

of goodness for a basis is based on an overall size measure for the basis, 
which relies in turn on the size measure for its constituent classes. For 
instance, in Figure 3, we must choose three from the seven nontrivial 1- 
dimensional homology classes: {[-Zi], [-2:2], [-^3], [-^i] + [-22], [^i] + [2^3] , [-22] + 
[2:3], [zi] + [22] + [-23]}- In this case, the intuitive choice is {[^i], [22], [^3]}, 
as this choice reflects the fact that there is really only one large cycle. 




Z2 



Figure 3: A topological space formed from three circles. See accompanying 
discussion in the text. 

1.1 Related Works 

There is much work that has been done in the general field of computational 
topology [2]. Examples include fast algorithms for computing Betti numbers [11, 
15], as well as techniques for relating topological spaces to their approximations 
[19, 5]; where the latter usually derive from sampled versions of the spaces. 
However, in the following we will focus only on the areas of computational 
topology which are most germane to the current study: persistent homology 
and algorithms for localizing topological features. Note that a more formal 
review of persistence will be given in Section 2.3. 
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Persistent Homology Persistent homology [12, 7, 22, 24] is designed to track 
the persistences of homological features over the course of a filtration of a topo- 
logical space. At first blush, it might seem that the powerful techniques of this 
theory arc ideally suited to solving the problems wc have set out. However, 
due to their somewhat different motivation, these techniques do not quite yield 
a solution. There are two reasons for this. First, the persistence of a feature 
depends not only on the space in which the feature lives, but also on the filtering 
function chosen. In the absence of a geometrically meaningful filter, it is not 
clear whether the persistence of a feature is a meaningful representation of its 
size. Second, and more importantly, the persistence only gives information for 
homology classes which ultimately die; for classes which axe intrinsically part 
of the topological space, and which thus never die, the persistence is infinite. 
However, it is precisely these essential (or non-persistent) classes that we care 
about. 

In more recent work, Cohen-Steiner et al. [6] have extended persistent homol- 
ogy in such a way that essential homology classes also have finite persistences. 
This extension serves to complete the theory and has some nice properties like 
stability, duality and symmetry for triangulated manifolds. However, the per- 
sistences thus computed still depend on the filter function, and furthermore, do 
not always seem to agree with an intuitive notion of size. See Figure 4. 
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Figure 4: Computing the extended persistent homology of a torus using the 

height function as the filter function. The (birth, death time) pairs of the two 
1-dimensional homology classes are (<i,<2) and (i2,<i), respectively. The per- 
sistences are not consistent with our intuition of their sizes. 

Localization of Topological Features Zomorodian and Carlsson [23] take 

a different approach to solving the localization problem. Their method starts 
with a topological space and a cover, a set of spaces whose union contains the 
original space. A blowup complex is built up which contains homology classes 
of all the spaces in the cover. The authors then use persistent homology to 
identify homology classes in the blowup complex which correspond to a same 
homology class in the given topological space. The persistent homology algo- 
rithm produces a complete set of generators for the relevant homology group. 
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which forms a basis for the group. However, both the quahty of the generators 
and the complexity of the algorithm depend strongly on the choice of cover; 
there is, as yet, no suggestion of a canonical cover. 

Using Dijkstra's shortest path algorithm, Erickson and Whittlesey [14] showed 
how to localize a one-dimensional homology class with its shortest cycle. Al- 
though not explicitly mentioned, the length of this shortest cycle can be deemed 
as a measure of the size of its homology class. They proved, by an application 
of matroid theory, that finding (3 linearly independent homology classes whose 
sizes have the smallest sum can be achieved by a greedy method, namely, find- 
ing the smallest homology classes one by one, subject to a linear independence 
constraint. Their algorithm takes 0{n^ log n + n^P + n(3'^) or 0{n^(3 + n(3^) if /3 
is nearly linear in n. The authors also show how the idea carries over to finding 
the optimal generators of the first fundamental group, though the proof is con- 
siderably harder in this case. Note that this work is restricted to 1-dimensional 
homology classes in a 2-dimensional topological space. A similar measure was 
used by Wood et al. [21] to remove topological noise of 2-dimensional surface. 
This work also suffers from the dimension restriction. 

1.2 Our Contributions 

In this paper, we solve the three problems listed in Section 1, namely, measuring 
the size of homology classes, localizing classes, and choosing a basis for a ho- 
mology group. We define a size measure for homology classes, based on relative 
homology, using geodesic; distance;. This solves the first problem. For the second 
problem, we localize homology classes with cycles which are strongly related to 
the size measure just defined. We solve the third problem by choosing the set of 
linearly independent homology classes whose sizes have the minimal sum. The 
time complexity of our algorithm is 0{(3'^n^ log^ n), where n is the cardinality of 
the given simplicial complex, and /3 is the dimension of the homology group. We 
assume the input of our algorithm is a simplicial complex K, i.e. a triangulation 
of the given topological space. 

Size measure and localization. In section 3, we define the size of a ho- 
mology class h, S{h), as the radius of the smallest geodesic ball within the 
topological space which carries a cycle of h, zq G h. Here a geodesic ball, Bp, is 
the subset of the topological space consisting of points whose geodesic distance 
from the point p is no greater than r. The intuition behind this definition will 
be further elaborated in Section 3.2. Any cycle of h lying within this smallest 
geodesic ball is a localized cycle of h. 

Optimal homology basis. Although there are 2^^ — 1 nontrivial homology 
classes, only (3 of them are needed to construct the homology group, subject 
to the constraint that these classes generate the group. We choose to compute 
the set whose sizes have the minimal sum, which we call the optimal homology 
basis. This basis contains as few large homology classes as possible, and thus 
captures important features effectively. 
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Computing the smallest class. To compute the smallest nontrivial homol- 
ogy class, we find the smallest geodesic ball, Bmin, which carries any nonbound- 
ing cycle of the given simplicial complex K. To find Bmin, we visit all of the 
vertices of K in turn. For each vertex p, we compute the persistent homology 
using the geodesic distance from p as a filter. This yields the smallest geodesic 
ball centered on p carrying any nonbounding cycle of K, namely, Bp^^\ The 
ball with the smallest r{p) is exactly Bmin- Once we find Bmin, its radius, 
Tmini is the size of the smallest class. Any nonbounding cycle of K carried by 
Bjnin is a localized cycle of this class, and can be computed by a reduction-style 
algorithm. 

Computing the optimal homology basis. We use matroid theory to prove 
that the optimal homology basis c;an be c;omputed by a greedy method. We first 
compute the smallest homology class of the given simplicial complex K, as de- 
scribed above. We then destroy this class by sealing up one of its cycles with 
new simplic;es. Next, wc compute the smallest homology class of the updated 
simplicial complex, K' , which is the second smallest class of the optimal ho- 
mology basis of K. We then destroy this class and proceed to compute the 
third smallest class. The whole basis is computed in p rounds. Theorem 4.5 
establishes that this sealing technique yields the optimal homology basis. The 
time to compute the optimal homology basis is 0{P'^n^). 

An improvement using finite field linear algebra. In computing the 
smallest geodesic ball Bmin, we may avoid explicit computation of Bp'"^^ for 
every p. Instead, Theorem 5.3 suggests we visit all of the vertices in a breadth- 
first fashion. For the root of the breadth-first tree, we use the explicit algorithm; 
for the rest of the vertices, we need only check whether a specific geodesic ball 
carries any nonbounding cycle of K. This latter task is not straightforward, as 
some of the nonbounding cycles in this ball may be boundaries in K. We use 
Theorem 5.5 to reduce this problem to rank computations of sparse matrices 
over the Z2 field. The time to compute the optimal homology basis with this 
improvement is 0(/3*n^ log^ n). 

Consistency with existing results. We prove in Section 6 that our result 
is consistent with the low dimensional optimal result of Erickson and Whittlesey 
[14]. 

2 Preliminaries 

In this section, we briefly describe the background necessary for our work, in- 
cluding a discussion of simplicial complexes, homology groups, persistent homol- 
ogy, and relative homology. Please refer to [18] for further details in algebraic 
topology, and [12, 22, 7, 24] for persistent homology. For simplicity, we restrict 
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our discussion to the combinatorial framework of simplicial homology in the Z2 
field. 

2.1 Simplicial Complex 

A d- dimensional simplex or d-simplex, a, is the convex hull of d + 1 affinely 
independent vertices, which means for any of these vertices, Vi, the d vectors 
"^j ~ Vi, j i, are linearly independent. A 0-simplex, 1-simplex, 2-simplex and 
3-simplex are a vertex, edge, triangle and tetrahedron, respectively. The convex 
hull of a nonempty subset of vertices of a is its face. A simplicial complex K is 
a finite set of simpliccs that satisfies the following two conditions. 

1. Any face of a simplex in K is also in K. 

2. The intersection of any two simplices in K is either empty or is a face for 
both of them. 

The dimension of a simplicial complex is the highest dimension of its simplices. 
If a subset K^^K \s & simplicial complex, it is a subcomplex of K. 

2.2 Homology Groups 

Within a given simplicial complex K, a d-chain is a formal sum d-simplices 
in K, c = X^ctga: '^ctO"; a^r G Z2. All the d-chains form the group of d-chains, 
Crf(A'). The boundary of a d-chain is the sum of the {d — l)-faces of all the 
d-simphces in the chain. The boundary operator 84 : Cd{K) Cd-i{K) is a 
group homomorphism. 

A d- cycle is a d-chain without boundary. The set of d-cycles forms a sub- 
group of the chain group, which is the kernel of the boundary operator, 7-d{K) = 
ker(9d)- A d-boundary is the boundary of a (d-l-l)-chain. The set of d-boundaries 
forms a group, which is the image of the boundary operator, Bd{K) = img(9d+i). 
It is not hard to see that a d-boundary is also a d-cycle. Therefore, Bd{K) is a 
subgroup of Zci{K). A d-cycle which is not a d-boundary, z G Z4{K)\Bd{K), is 
a nonbounding cycle. 

The d-dimensional homology group is defined as the quotient group V\d{K) = 
Zd{K)/Bd{K). An element in V\d{K) is a homology class, which is a coset of 
Bd{K), [z] = z + Bd{K) for some d-cyclc z E Zd{K). If z is a d-boundary [z] = 
Bd{K) is the identity element of Hd{K). Otherwise, when 2: is a nonbounding 
cycle, [z] is a nontrivial homology class and z is called a representative cycle 
of [z]. Cycles in the same homology class are homologous to each other, which 
means their difference is a boundary. 

The dimension of the homology group, which is referred to as the Betti 
number, Pd = dim(H(i(-ft')) = dim(Zd(-ft')) — 6ivii{Bd{K)). It can be computed 
with a reduction algorithm based on row and column operations of the bound- 
ary matrices [18]. Various reduction algorithms have been devised for different 
purposes [17, 12, 22]. 
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The following notation will prove convenient. We say that a rf-chain c G 
Cd{K) is carried by a subcomplex Kq when all the d-simplices of c belong to 
Kq, formally, cC Kq. We denote vey:t{K) as the set of vertices of the simplicial 
complex K, vcrt(c) as that of the chain c. 

In this paper, we focus on the simplicial homology over the finite field Z2. In 
this chain corresponds to a na-dimensional vector, where rid is the num- 

ber of (i-simpliccs in K. Computing the boundary of a d-chain corresponds to 
multiplying the chain vector with a boundary matrix [61, whose column 
vectors are boundaries of rf-simplices in K. By slightly abusing the notation, 
we call the boundary matrix d^- 

2.3 Persistent Homology 

Given a topological space X and a, filter function / : X ^ M, persistent homology 
studies the homology classes of the sublevel sets, X* = f~^{—oo,t]. A nontrivial 
homology class in X*i may become trivial in X*^, ti < t2, (formally, when 
induced by the inclusion homomorphism) . Persistent homology tries to capture 
this phenomenon by measuring the times at which a homology class is born 
and di(^s. The persistence, or life time of the class is the difference between 
its death and birth times. Those with longer lives tell us something about the 
global structure of the space X, as described by the filter function. Note that 
the essential, that is, nontrivial homology classes of the given topological space 
X will never die. 

Edelsbruniicr et al. [12] devised an 0{n'^) algorithm to compute the persis- 
tent homology. Its input are a simplicial complex K and a filter function /, 
which assigns each simplex in if a real value. Simplices of K are sorted in 
ascending order according to their filter function values. This order is actually 
the order in which simplices enter the sublevel set /~^(— oo,t] while t increases. 
For simplicity, in this paper we call this ordering the simplex- ordering of K 
with regard to /. The output of the algorithm is the birth and death times of 
homology classes. 

The algorithm performs column operations on an overall incidence matrix, 
D, whose rows and columns correspond to simplices in K. An entry D{i,j) = 1 
if and only if the simplex cTj belongs to the boundary of the simplex aj. To some 
extent, D is a big boundary matrix which can accommodate chains of arbitrary 
dimension. Columns and rows of D are sorted in ascending order according to 
the function values of simplices. The algorithm performs the column reduction 
from left to right, recording low(z) as the lowest nonzero entry of each column 
i. If column i is reduced to a zero column, low(i) does not exist. To reduce 
column i, we repeatedly find column j satisfying j < i and low(j) = low(i); we 
then add column j to column i, until column i becomes a zero column or we 
cannot find a qualified j anymore. 

The reduction of D can be written as a matrix multiplication, 

R = DV, (1) 



8 



where R is the reduced matrix and V is an upper triangular matrix. Columns 
of V corresponding to zero columns of R whose corresponding simplices are 
c?-dimensional form a basis of the cycle group Zd,{K). 

After the reduction, each paring, low(?') = j, corresponds to a homology 
class whose birth time is /(crj) and death time is f{crj). A simplex cTj that is 
not paired, namely, neither low(z) = j nor low(j) = i for any j, corresponds 
to an essential homology class, namely, a nontrivial homology class of K. An 
essential homology class only has a birth time, namely, f{(Ti), and it never dies. 
Therefore, all the nontrivial homology classes of K have infinite persistences. 

2.4 Relative Homology 

Given a simplicial complex K and a subcomplex ifo Q K, we may wish to study 
the structure of K by ignoring all the chains in Kq. We consider two d-chains, 
Ci and C2 to be the same if their difference is carried by Kq. The objects we 
are interested in are then defined as these equivalence classes, which form a 
quotient group, Kq) = Cd{K)/Cd{KQ). We call it the group of relative 

chains, whose elements (cosets), are called relative chains. 

The boundary operator dd : Cd{K) Cd-i{K) induces a relative boundary 
operator, : Qd{K,Ki^) Cd~iiK, Kq). Analogous to the way we define 
Zd{K), Bd{K) and Hd{K) in Cd{K), we define the group of relative cycles, 
the group of relative boundaries and the relative homology group in Cd{K, Kq), 
denoted as Zd{K,KQ), Bd{K,KQ) and Hd{K, K^j), respectively. An element in 
Zd{K,KQ)\Bd{K,KQ) is a nonbounding relative cycle. 

The following notation will prove convenient. We define a homomorphism 
(j)Ka • ^d{K) — > Cd{K,Kf)) mapping c?-chains to their corresponding relative 
chains, (pnoic) — c + Cd{Ko). This homomorphism induces another homomor- 
phism, (/>Jf^ : Hd{K) — + Hd{K,Ko), mapping homology classes of K to their 
corresponding relative homology classes, (p^^ih) = 4>Ko{^) + ^diK^Ko) for any 
z e h. 

Given a d-chain c G C^, its corresponding relative chain (pKoic) is a relative 
cycle if and only if dd{c) is carried by Kq. Furthermore, it is a relative boundary 
if and only if there is a (rf + l)-chain c' e Cd+i{K) such that c — dd+i{c') is 
carried by Kq. 

These ideas are illustrated in Figure 5. Although zi and Z2 are both non- 
bounding cycles in K, (pKoi'-i) nonbounding relative cycle whereas 0x0(^2) 
is only a relative boundaxy. Although chains Ci and C2 are not cycles in K, 
i'Koici) and 4'Ko{c2) are relative cycles homologous to (f>Ka{zi) and 4>Ka{z2), 
respectively. 

Note that [^1] and [^2] are both nontrivial homology classes in K. But 
their correspondences in the relative homology group may not necessarily be 
nontrivial. We can see that '/'^^([zi]) is a nontrivial relative homology class, 
whereas (t>*Ka{[^'A) is trivial. We say that the class [2:2] is carried by Kq. This 
concept play an important role in our definition of the size measure. Further 
details will be given in Section 3.2. 
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Figure 5: A disk with two holes, whose triangulation is K. Simphces of K lying 
completely in the dotted rectangle form a subcomplex K^. The 1-dimensional 
relative homology group Hi(A', Xq) has dimension 1, although V\i{K) has di- 
mension 2. The nontrivial class [22] is carried by A'o- 



2.5 Rank Computations of Sparse Matrices over Finite 
Fields 

Wiedemann [20] presented a randomized algorithm to capture the rank of a 
sparse matrix over finite field. His method performs a binary search for the 
rank. For an m x n sparse matrix A, the algorithm starts with s = min(m, n)/2. 
It tests if s > rank(^) or not, and then decides whether s = s/2 or s = 3s/2. 
For each s, s x m and s xn matrices P and Q are radomly generated for several 
times. If PAQ is singular all the times, s > rank(A) with high probability. 
The expected time of the algorithm is 0{n{uj + nlogn)logn), where n is the 
maximal dimension of the matrix and lo is the total number of nonzero entries 
in A. 



3 Defining the Problem 

In this section, we provide a technique for ranking homology classes according to 
their importance. Specifically, we solve the three problems mentioned in Section 
1 by providing 

• a meaningful size measure for homology classes that is computable in 
arbitrary dimension; 

• localized cycles which are consistent with the size measure of their homol- 
ogy classes; 

• and an optimal homology basis which distinguishes large classes from small 
ones effectively. 
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3.1 The Discrete Geodesic Distance 



In order to measure the size of homology classes, we need a notion of distance. 
As we will deal with a simplicial complex it is most natural to introduce a 
discrete metric, and corresponding distance functions. We define the discrete 
geodesic distance from a vertex p € vert(_R'), fp : YCvt{K) Z, as follows. 
For any vertex q G vert(if), fp{q) = d\st{p,q) is the length of the shortest 
path connecting p and q, in the 1-skeleton of K\ it is assumed that each edge 
length is one, though this can easily be changed. We may then extend this 
distance function from vertices to higher dimensional simplices naturally. For 
any simplex a G K, fp{a) is the maximal function value of the vertices of a, 
/p((t) = maXqgvort((T) fpil)- Finally, we define a geodesic ball B^, p G vert(ii'), 
r > 0, as the subset of K, Bp = {a G K \ fp{a) < r}. It is straightforward to 
show that these subsets are in fact subcomplexes. 

3.2 Measuring the Size of a Homology Class 

Using notions from relative homology, we proceed to define the size of a ho- 
mology class as follows. Given a simplicial complex K, assume we are given a 
collection of subcomplexes C = {L C K}. Furthermore, each of these subcom- 
plexes is endowed with a size. In this case, we define the size of a homology 
class h as the size of the smallest L carrying h. Here we say a subcomplex 
L carries h if /; has ;i trivial image in the relative homology group Hd{K,L), 
namely, = B4{K,L). In Figure 5, the class [Z2] is carried by Kq, whereas 

[zi] is not. 

Definition 3.1. The size of a class h, S{h), is the size of the smallest measur- 
able subcomplex carrying h, formally, 

^(/i) = minsize(L) s.t. (I)l{h) = Bd{K, L). 

To facilitate computation, we prove the following theorem. 

Theorem 3.2. The size of a homology class h, is the size of the smallest mea- 
surable subcomplex carrying one of its cycles, z G h, formally, 

S{h) = minsize(L) s.t. 3z £ h : z C L, 



Proof. As we know, for any cycle z G h, the relative chain (I)l{z) is a relative 
boundary if and only if there is a ((i-l-l)-chain c' € C^+i {K) such that z—dd+i (c') 
is carried by L. This means that h is carried by L if and only if there exists 
some cycle z £ h carried by L. □ 

In this paper, we take C to be the set of discrete geodesic balls, C = {B^ \ 
p G vert(i4r),r > 0}. The size of a geodesic ball is naturally its radius r. 
Combining the size definition and the theorem we have just proven, we define 
the size measure of homology classes as follows. 
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Definition 3.3. The size of a homology class is the radius of the smallest 
geodesic ball carrying one of its cycles, formally, 

S{h) = minr s.t. 3p £ yert{K) and z G ft, : z C B^. 

This smallest geodesic ball is denoted as Bmm{h) for convenience, whose radius 
is S{h). 

In Figure 2 (right), the three geodesic balls centered at pi, p2 and are 
the smallest geodesic balls carrying nontrivial homology classes [zi], \z-i\ and 
[2:3], respectively. Their radii are the size of the three classes. In Figure 6, 
the smallest geodesic ball carrying a nontrivial homology class is the pink one 
centered at p2 ^ , not the one centered at p\ . Note that these geodesic ball may 
not look like Euclidean balls in the embedding space. 




Figure 6: On a tube, the smallest geodesic ball is centered at p2, not p\. 



3.3 A Localized Cycle 

We would like to localize a homology class with a cycle which conveys its size. 
Define the radius of a cycle z as, 

rad(z) = min max dist(p, g), 

pGvcrt(A') qGvcrt(z) 

which is a natural extension of the canonical definition of radius, e.g. of a Eu- 
clidean ball. We define the localized cycles of a homology class h as the one with 
the minimal radius, namely, zq = argmin^g^ rad(z). 

Based on Theorem 3.2, it is not hard to see that the size of a class h is 
equal to the minimal radius of its cycles, namely, S{h) ~ min^g/j rad(z), which 
is exactly the radius of its localized cycles. Thus, this definition of localized 
cycles agrees with our size measure for homology classes. 

Given a homology class h, any of its cycles carried by Bmin{h) has the radius 
S{h), and thus is localized. In Figure 2, zi and Z2 are localized cycles of [zi] 
and [Z2] because they are carried by Bmin{[zi]) and Bmini[zi\), respectively. 

Remark 3.4. Another quantity which can describe the size of a cycle is the 
diameter 

diam(z) = max dist(p, g). 

p,(j6ivcrt(s) 

^This geodesic ball actually carries the shortest cycle of the class using the definition of 
Erickson and Whittlesey [14]. We will discuss this in Section 6. 
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We deliberately avoid this quantity because we conjecture computing the cycle 
with the minimal diameter, (axgmin^g/j diam(2;)), is NP-complete. On the other 
hand, our definition of a localized cycle gives a 2- approximation of the minimal 
diameter, formally, 

diam I argminrad(^) ) < 2 min diam(2) , 

which can he shown to he a tight hound. 

3.4 The Optimal Homology Basis 

There are 2'^'' — 1 nontrivial homology classes. However, we only need (3d of 
them to form a basis. The basis should be chosen wisely so that we can easily 
distinguish important homology classes from noise. See Figure 3 for an example. 
There are 2"^ — 1 = 7 nontrivial homology classes; we need three of them to 
form a basis. We would prefer to choose [z-^, [z^} as a basis, rather than 
{[zi] + [Z2] + [2:3], [Z2] + [23], [23]}. The former indicates that there is one big 
cycle in the topological space, whereas the latter gives the impression of three 
large classes. 

In keeping with this intuition, the optimal homology basis is defined as fol- 
lows. 

Definition 3.5. The optimal homology basis is the basis for the homology group 
whose elements ' size have the minimal sum, formally, 

argmin ^ s.t. dim({/ii, /i^J) = /J^. 

This definition guarantees that large homology classes appear as few times 
as possible in the optimal homology basis. In Figure 3, the optimal basis will 
be {[2:1], [Z2], [zs]}, which has only one large class. 

4 The Algorithm 

In this section, we introduce an algorithm to measure and localize the optimal 
homology basis as defined in Definition 3.5. We first introduce an algorithm to 
measure and localize the smallest homology class, namely, Measure-Smallest(K), 
which uses the persistent homology algorithm. Based on this procedure, we 
provide the algorithm Measure-AII(K), which measures and localizes the optimal 
homology basis. The algorithm takes 0{(3^n^) time, where Pa is the Betti 
number and n is the cardinality of the input K. 
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4.1 Measuring and Localizing the Smallest Homology Cleiss 



The procedure Measure-Smallest(K) measures and localizes the smallest nontriv- 
ial homology class, namely, the one with the smallest size, 

hmin = argmin S{h). 

The output of this procedure will be a pair {S„iin, Zmin), where Smin — S{hmin) 
and Zmin is a localized cycle of hmin- According to the definitions, this pair is 
determined by the smallest geodesic ball carrying hmin, namely, Bmin{hmin)- 
Once this ball is computed, its radius is Smin, and a cycle of hmin carried by 

this ball is Zmin- 

We first present an algorithm to compute the smallest geodesic ball carrying 
hmin, i-C. BminiKnin)- Sccoud, wc introduce the technique for finding Zmin from 
the computed ball. The two corresponding procedures are Bmin and Localized- 
Cycle. See Algorithm 1 for pseudocode of the procedure Measure-Smallest(K). 



Algorithm 1 Measure-Smallest(K) 

Goal: measuring and localizing 

Input: K: the given simplicial complex. 

Output: Smin, Zmin'^^c sizc and a localized cycle of h, 

1: (rmm,Pmm) = Bmin(K) 

= loca\\ze(i-Cyde{pmin,rmin,K) 



4.1.1 Computing Bmin {hmin) 

It is straightforward to see that Bmin{hmin) is also the smallest geodesic ball 
carrying any nontrivial homology class of K. It can be computed by comput- 
ing and comparing the smallest geodesic balls centered at all vertices carrying 
nontrivial classes. See Algorithm 2 for the procedure. 

Theorem 4.1. Procedure Bmin(K) computes Bmin{hmin) ■ 

Proof. For each vertex p, we compute the smallest geodesic ball centered at p 
carrying any nontrivial homology class, namely, Bp^^^ . We apply the persistent 
homology algorithm to K with the filter function fp. Notice that a geodesic ball 
Bp is the sublevel set /~^(— oo,r] C K. Nontrivial homology classes of K are 
essential homology classes in the persistent homology algorithm. (For clarity, in 
the rest of this paper, we may use "essential homology classes" and "nontrivial 
homology classes of K" interchangable.) Therefore, the birth time of the first 
essential homology class is r{p), and the subcomplex /~^(— oo,r(p)] is Bp^^\ 

When all the Bp'^^^'s are computed, we compare their radii and pick the 
smallest one as Bmin{hmin)- D 
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Algorithm 2 Bmin(K) 



Goal: computing Bmin{hmin)- 

Input: K: the given simplicial complex. 

Output: Pmin, rmin-th.e center and radius of Bmin{hmin)- 

1* '^min ~t~00 

2: for p G vert{K) do 

3: apply the persistent homology algorithm to K with filter function fj 
4: r{p) =birth time of the first essential homology class 
5: if r{p) < Tmin then 

6* Pmin P 

7: r^in = r{p) 
8: end if 
9: end for 



Once Brnin{hmin) IS computcd, its radius is the size of hmin- Any cycle of 
hmin carried by Bm,in{hmin) is a localized cycle of hmin- Next, we explain how 
to compute one such localized cycle. 

4.1.2 Computing a Localized Cycle of hmin 

The procedure Localized-Cycle(pTOOT. ?'mm.^) computes a locahzcd cycle of hmin- 
We assume that Bmin{hmin), the smallest geodesic ball carrying the smallest 
homology class, carries exactly one nontrivial homology class, (i.e. //,„,„, itself). 
^ Any cycle carried by this ball which is nonbounding in is a cycle of hmin, 
and thus is a localized cycle of hmin- Therefore, we first compute a basis for 
all the cycles carried by B„iji,(hii,j,i). Second, we check elements in this basis 
one by one until we find one which is nonbounding in K. See Algorithm 3 for 
the procedure. Note that we use the algorithm of Wiedemann [20] for rank 
computation, because the related matrices are sparse. 

Theorem 4.2. The procedure Localized-Cycle(pTOm.''mm.-f'') computes a local- 
ized cycle of hrnin- 

Proof. The cycles carried by Bmin{hmin) form a vector space 

Zd(-f^) n Cd{Bmin{hm.in)) ■ 

We compute its basis by column reducing the boundary matrix restricted to 
Bmin{hmin)- After the reduction, each zero column corresponds to an element 
of the basis. More specifically, we compute the basis as follows. We first con- 
struct a matrix 9^ with columns of the boundary matrix dd whose corresponding 
simplices belong to Bmini^min)- Next we perform a column reduction on this 

^ This assumption may not necessarily be true. It is possible that B^i„{hmin) carries two 
or more nontrivial classes. Suppose Pmin is the center of B^i„{hmin) ■ Then the proof can be 
easily modified to deal with this case, by fixing an order on simplices with the same function 
ysiue fp^i„, and simulating this order on fp^i„, i.e. treating /p„j„(o-i) < /p„j„(o-2) if cri 
comes before <T2 (even though fp^i„{cri) = fpmini'^'^))- 



15 



Algorithm 3 Localized-Cycle(j;,„,„ ,r,„,;„,ii') 

Goal: compute a localized cycle of hmin- 

Input: p„„„,r™„: the center and radius of Brmn{hmin)- 

K: the given siinplic;ial c;oinplcx. 
Output: 

^min' ^ localized cycle of huiiw 
1: ranko = rank(i9d+i) 

2: construct 9^ by picking columns of dd whose corresponding simplices belong 

to B^i^{^hf]riin) 

3: reduce d'^ and get R and V 

4: for z = columns in V corresponding to zero columns in R do 

5: ranki = rank([2;, 

6: if ranki ^ ranko then 

7: Zmin — ^ 

8: break 
9: end if 
10: end for 



matrix from left to right, like in the persistent homology algorithm. The reduc- 
tion corresponds to a matrix multiplication 

R=d'aV, 

where R is the reduced matrix and V is an upper triangular matrix. The 
columns in V corresponding to zero columns in R form the basis of cycles 
carried by Bmin{hmin)- 

Next, we check elements in this basis one by one to find one which is non- 
bounding in K. An element of this basis, z, is nonbounding in K if and only if it 
cannot be expressed as a linear combination of boundaries of K. Since columns 
of the boundary matrix dd+i generate B,i{K). we just need to compute the rank 
of the matrix [z, and compare it with the rank of dd+i- The cycle z is 

nonbounding in K if and only if these two ranks are different. □ 

4.2 The Optimal Homology Basis 

In this section, we present the algorithm for computing the optimal homology 
basis defined in Definition 3.5, namely, Tid- We first show that the optimal 
homology basis can be computed in a greedy manner. Second, we introduce an 
efficient greedy algorithm. 

4.2.1 Computing Hd in a Greedy Memner 

Recall that the optimal homology basis is 

Hd = argmin ^ S{hi) s.t. dim({/ii, h^^}) = pd- 
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We use matroid theory [9] to show that we can compute the optimal homology 
basis with a greedy method. Let H be the set of nontrivial d-dimensional 
homology classes (i.e. the homology group minus the trivial class). Let L be 

the family of sets of linearly independent nontrivial homology classes. Then we 
have the following theorem. The same result has been mentioned in [14]. 

Theorem 4.3. The pair (H, L) is a matroid when Pd > 0. 

Proof. We show (H, L) is a matroid by proving the following properties. 

1. The set H is finite and nonempty as card(_ff) = 2^^* — 1. 

2. For any set of linearly independent nontrivial homology classes, its subsets 
are also linearly independent. Therefore, elements in L are independent 
subsets of H, and L is hereditary. 

3. For any two sets of linearly independent classes li,l2 G L such that 
card(ii) < card(/2), we can always find a homology class h & hXh such 
that Zi U {h} is still linearly independent. Otherwise, any element in I2 is 
dependent on Zi. This means 



which contradicts the linear independence of l2- Therefore, {H, L) satisfies 
the exchange property. 



We construct a weighted matroid by assigning each nontrivial homology 
class its size as the weight. This weight function is strictly positive because a 
nontrivial homology class can not be carried by a geodesic ball with radius zero. 
According to matroid theory, we can compute the optimal homology basis 



with a naive greedy method as follows. 

1. Sort elements in H into an order which is monotonically increasing ac- 
cording to size, namely. 



2. Repeatedly pick the smallest class from seq{H) that is linearly indepen- 
dent of those we have already picked, until no more elements are qualified. 

3. The selected (3d classes {hi-^ , /i^^ , hi^^^ } form the optimal homology basis 
Hd- (Note that the h's are ordered by size, i.e. S{hi^) < S'(/iij.^j).) 

However, we cannot compute the exponentially long sequence seq{H) (ex- 
ponential in /3(j) directly. Next, we present our greedy algorithm which is poly- 



dim(Z2) < dim(Zi) = card(/i) < card(/2) 



□ 




seq{H) 



= {hi,h2,...,h(^2i^a_i'f),hi e H, 
such that S{hi) < S{hj) Mi < j. 



nomial. 
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4.2.2 Computing Tid with a Sealing Technique 

In this section, wc introduce the algorithm for computing Hd- Instead of com- 
puting the exponentially long sequence seq{H) directly, our algorithm uses a 
sealing technique and takes time polynomial in (3d- 

Wc start by measuring and localizing the smallest homology class of the given 
simplicial complex K, which is also the first class we choose for Tid- We destroy 
this class by sealing up one of its cycles - i.e. the localized cycle we computed 
with new simpliccs. Next, wc measure and loc;aliz{^ the smallest homology 
class of the augmented simplicial complex K' . This class is the second smallest 
homology class in Hd- We destroy this class again and proceed for the third 
smallest class in Jid- This process is repeated for Pd rounds, yielding Tid- 

We destroy a homology class by sealing up the class's localized cycle, which 
we have computed. To seal up this cycle z, we add (a) a new vertex v\ (b) a 
(d + l)-simplcx for each rf-simplcx of z, with vertex set equal to the vertex set 
of the d-simplex together with v] (c) all of the faces of these new simplices. In 
Figure 7, a 1-cycle with four edges, z\, is sealed up with one new vertex, four 
new triangles and four new edges. 

We assign the new vertices +00 geodesic distance from any vertices with 
which they share an edge in the original complex K. Whenever we run the 
persistent homology algorithm, all of the new simplices have +00 filter func- 
tion values. Furthermore, in the procedure Measure-Smallest(iir'), we will not 
consider any geodesic ball centered at these new vertices. In other words, the 
geodesic distance from these new vertices will never be used as a filter function. 
Algorithm 4 contains the pseudocode. 

Algorithm 4 Measure-AII(/f ) 
Goal: compute the optimal homology basis, Tid- 
Input: K: the given simplicial complex. 
Output: Hd'- the optimal homology basis. 

I: K' = K 

2: Wrf = 

3: for \ = 1 to (3d do 

4: h= {S,z) =Measure-Smallest(if') 

5: nd = ndU {h} 

6: seal z with new simplices, augment K' accordingly 
7: Va G K'\K, peK, fp{a) = +00 
8: end for 



Next, we prove that this algorithm does compute the optimal homology 
basis Hd- We will prove in Theorem 4.5 that Measure-AII(ii') produces the same 

result as the naive greedy method presented in the previous section. Wc begin 
by proving a lemma, based on the assumption in Footnote 2 that hmin is the 
only notrivial homology class carried by B^inihmin)- 

Lemma 4.4. Given a simplicial complex K, if we seal up its smallest homology 
class hmin{K), any other nontrivial homology class of K, h, is still nontrivial 
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in the augmented simplicial complex K' . In other words, any cycle of h is still 

nonbounding in K' . 

Proof. As we deal with two complexes K and K' with -ftT C /sT', we let / : 
Qd{K) Cd{K') and I* : Hd{K) Hd{K') be the maps induced by inclusion. 
Also, for a chain c, let |c| be the simplicial complex composed of simplices from 
c and their faces. 

We proceed by contradiction. Let Zmin € hmin{K) be the localized cycle of 
hmin{K) that we seal up. For any nontrivial class h € Hd{K), h ^ hmin{K), 
suppose I*{h) is trivial. We will show that there exists a cycle in h which is car- 
ried by Bmin{hmin), which contradicts the fact that hmin is the only nontrivial 
class carried by Brain{hmin)- 

Suppose I*{h) is trivial. For any cycle z G h, its corresponding I{z) is the 
boundary of a {d+ l)-chain in K'. As z is nonbounding in K, it must be the 
case that at least one of the simplices of this {d + l)-chain must be new. That 
is 



where at least one a„ ^ 0. But there exists a cycle z' which is homolo- 
gous to z in K, with z' = z — dd+i(^^f^j^ aT-r), which yields, finally, that 
I{z') = dd+i{J2aeK'\K (^<7<7)- In other words, I{z') is the boundary of a {d+1)- 
chain all of whose simplices are new. Any simplex of is a face of the 

new simplices and belongs to the original complex K, and thus belongs to 
\I{^min)\- It follows that I{z') is carried by the simplicial complex correspond- 
ing to I{zmin), \I{zmin)\; and hencc, z' is carried by \zmin\- Consequently, z' 
and h are carried by Bmin{hmin), which leads to the desired contradiction. □ 

Theorem 4.5. The procedure Measure-AII(/^) computes Ha- 

Proof. We prove the theorem by showing that the sealing up technique produces 

the same result as the naive greedy algorithm, namely, Hd = {h^-^ , hi^ , hi^^}. 
We show that for any I < (id, after computing and sealing up the first I — 1 
classes of Hd, i-e. {/ij^, the next class we choose is exactly h^. In 
other words, the localized cycle and size of the smallest class of the augmented 
simplicial complex K^~^ are equal to that of h^. 

First, any class between /i^j ^ and /ij, in seq{H) will not be chosen. Any 
such class hj is linearly dependent on classes that have already been chosen, 
namely, {/lii, hi^_^}. Since these classes have been sealed up, a cycle of hj is 
a boundary in K^~^. Thus, hj cannot be chosen. 

Second, Lemma 4.4 leads to the fact that for any class in seq{H) that is not 
linearly dependent on {/ij^, it is nontrivial in K''~^. 

Third, the smallest class of K^~^, hmin{K''~^), corresponds to /ij,: any new 
simplex belonging to K^~^\K will not change the computation of the geodesic 
balls Bp with finite radius r, and thus will change neither the size measurement 
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nor the localization. Thus, the hmin{K^~^) computed by the sealing technique 
is identical to hi^ computed by the naive greedy method, in terms of the size 
and the localized cycle. □ 

The algorithm is illustrated in Figure 7. The rectangle, 21, and the octagon, 
Z2, are the localized cycles of the smallest and the second smallest homology 
classes (S'([zi]) = 2, £'([22]) = 4). The nonbounding cycle — zi + Z2 corre- 
sponds to the largest nontrivial homology class [23] — [zi] + [z2\ {S{[zz]) = 5). 
After the first round, we choose [zi] as the smallest class in Tii. Next, we de- 
stroy [zi\ by sealing up zi, which yields the augmented complex K' . This time, 
we choose [22], giving Hi = {[zi], [22]}. 




Figure 7: Left: the original complex K . Right: the augmented complex K' 
after seahng up the smallest class, [zi]. 



4.3 Complexity 

We analyze the complexity of the non-refined algorithm. Denote n and m as the 
upper bounds of the total numbers of simplices of the original complex K and 
the intermediate complex K' , respectively. The algorithm runs the procedure 
Measure-Smallest (3d times with the input K' , and thus runs the procedures Bmin 
and Localized-Cycle /3d times with the input K' . 

The procedure Bmin runs the persistent homology algorithm on the inter- 
mediate complex, K' , using filter function fp for each vertex of the original 
complex, K. Therefore, each time Bmin is called, it takes 0(nr7i'^) time. 

The procedure Localized-Cycle runs the persistent homology algorithm once, 
and Wiedemann's rank computation algorithm 0{m) times. The matrices used 
for rank computations are [z, dd+i] which have 0{m) nonzero entries. Therefore, 
each time Localized-Cycle is called, it takes O(m'^log^m) time. 

In total the whole algorithm takes 0{f3d{nm^ + m^log^m)) = 0{[3dnrrv') 
time. Next, we bound m, the size of the intermediate simplicial complex K' . 
During the algorithm, we seal up Pd nonbounding cycles. For each sealing, the 
number of newly added simplices is bounded by the number of simplices of the 
sealed cycle. As we have shown, each cycle we seal up only contains simplices 
in the original complex K . Therefore, the number of new simplices used to seal 
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up each cycle is 0{n). The size of the intermediate simplicial complex, K', is 
0{(}dn) throughout the whole algorithm. 

Finally, substitute /3dn for m. We conclude that the algorithm takes 0{(3dnm^) = 
OiPMPdn)^) = OiPX) time. 

5 An Improvement Using Finite Field Linear 
Algebra 

In this section, we present an improvement on the algorithm presented in the 
previous section, more specifically, an improvement on the procedure Bm\r\{K). 
The idea is based on the finite field linear algebra behind the homology. 

We first observe that for neighboring vertices, pi and p2, the persistence 
diagrams using fp^ and fp^ as filter functions are close. In Theorem 5.3, we 
prove that the birth times of the first essential homology classes using fp^ and 

differ by no more than 1. This observation suggests that for each p, instead 

of computing Bp^^^ we may just test whether a certain geodesic ball carries any 
essential homology class. Second, with some algebraic insight, we reduce the 
problem of testing whether a geodesic ball carries any essential homology class 
to the problem of comparing dimensions of two vector spaces. Furthermore, we 
use Theorem 5.5 to reduce the problem to rank computations of sparse matrices 
on the Z2 field, for which we have ready tools (of Wiedemann [20] ) . 

In doing so, we improve the complexity of computing the optimal homology 
basis to OiPjn^log^). 

Remark 5.1. This complexity is close to that of the persistent homology algo- 
rithm, whose complexity is O(n^). Given the nature of the problem, it seems 
likely that the persistence complexity is a lower bound. If this is the case, the 

current algorithm is nearly optimal. 

Remark 5.2. Cohen-Steiner et al. [8] provided a linear algorithm to maintain 
the persistent diagram while changing the filter function. However, this algo- 
rithm is not directly applicable in our context. The reason is that it takes 0{n) 
time to update the persistent diagram for a transposition in the simplex- ordering. 
In our case, even for filter functions of two neighboring vertices, it may take 
0{n'^) transpositions to transform one simplex- ordering into the other. There- 
fore, updating the persistent diagram while changing the filter function takes 
0{n^) X 0{n) = O(n^) time. This is the same amount of time it would take to 
compute the persistent diagram from scratch. 

In this section, we assume that K has a single component; multiple compo- 
nents can be accommodated with a simple modification. For convenience, we 
use "carrying nonbounding cycles" and "carrying essential homology classes" 
interchangeably, because a geodesic ball carries essential homology classes of K 
if and only if it carries nonbounding cycles of K. 
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5.1 The Stability of Persistence Leads to An Improvement 



Cohen-Stcincr ct al. [7] proved that the change, suitably defined, of the per- 
sistence of homology classes is bounded by the changes of the filter functions. 
Since the filter functions of two neighboring vertices, fp^ and fp^, are close to 
each other, the birth times of the first nonbounding cycles in both filters axe 
close as well. This leads to Theorem 5.3. 

Theorem 5.3. // two vertices pi and p2 are neighbors, the birth times of the 
first nonbounding cycles for filter functions fp^ and fp^ differ by no more than 
1. 

Proof. We first prove that the filter functions axe close for two neighboring 
vertices pi and P2, formally, 

|/p,-/p.|oo<l. (2) 

For any vertex q, we can connect q and P2 by concatenating the edge (pi,P2) to 

the shortest path connecting q and pi. Therefore the geodesic distance between 
q and p2 is no greater than one plus the geodesic distance between q and pi, 
formally, 

/p.(9)<l + ,/p,(g). 

It is trivial to see that we can switch pi and P2 in this equation. Therefore, we 
have 

l/pi(9)-/p.(9)l<l. 

It is not hard to extend this equation from any vertex q G vert(i^) to any 
simplex a £ K. Therefore, Equation (2) is proven. 

Next, we show that the birth times of the first nonbounding cycles in the 
two filter functions are close, formally, 

l/p.(^')-/p.(^")l<l, (3) 

where z' and z" are the first nonbounding cycles in the filters fp^ and fp^, 
respectively. Here by slightly abusing the notation, we denote f{z) as the birth 
time of the cycle z in the filter /. 

It is not hard to see that the birth time of any cycle z is the maximum of 
the function values of its simplices, and thus, is the max;imum of the function 
values of its vertices, formally, 

f{z)= max /(g). 

qGvert(2;) 

We prove Equation (3) by contradiction. Suppose 

fpAz')-fpA^")>2. 
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We know that for any vertex q G vert (2:"), 

fpM<fpAz")<fpAz')-'2. 

From Equation (2), we have 

fpM) < fpM + 1 < fpA^') - 1,V<Z e vert(z"), 
/pi {z") = max /p, (9) < /p, {z') - 1. 

This contradicts the fact that z' is the first nonbounding cycle in the filter fp^ . 
Therefore, the assumption is wrong, and 

fpAz')-fpAz")<i- 

Similarly, we can prove that 

fpA^")-fpA^')<^- 

In summary, we have proven Equation (3), and consequently, proven the theo- 
rem. □ 

This theorem suggests a way to avoid computing Bp^^-* for all p € K. Recall 
that r{p) is the radius of the smallest geodesic ball centered at p that carries 
any nonbounding cycle. Based on this theorem, we know that for any vertex 
Pi) ^(Pi) > ^(Pj) ~ 1 for a.ny neighbor pj. Since our objective is to find the 
minimum of the r(p)'s, we can do a breadth-first search through all the vertices 
with global variables Vmin recording the smallest r{p) we have found, and Pmin 
recording the corresponding center p. 

We start by applying the persistent homology algorithm on K with filter 
function fp^ . Initialize rmin as the birth time of the first nonbounding cycle of 

^(Po)j and Pmin as po. Next, we do a breadth-first search through the rest 
vertices. For each vertex pi,i ^ 0, we know there exists a neighbor pj such that 
r{pj) > Tmin- Therefore, 

r{Pi) > r{pj) - 1 > rmin - 1- 

We only need to test whether the geodesic ball 5^""*""^ carries any nonbounding 
cycle of K. If so, rmin is decremented by one, and Pmin is updated to p. 

However, testing whether the subcomplex B^min-i carries any nonbounding 
cycle of K is not as easy as computing nonbounding cycles of the subcomplex. A 
nonbounding cycle of 5^™'"^^ may not be nonbounding in K as we require. For 
example, in Figure 8, we want to compute the smallest geodesic ball centered 
at p carrying any nonbounding cycle of K, Bp^^\ The gray geodesic ball in 
the first figure does not carry any nonbounding cyc;le of K, although it carries 
its own nonbounding cycles. The geodesic ball in the second figure carries 
nonbounding cycles of K and is the ball we want, namely, Bp^^K Therefore, we 
need algebraic tools to distinguish nonbounding cycles of K from those of the 
subcomplex Bp""*""^. 
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Figure 8: Computing Bp in a torus with tail. The ball in the second figure is 
what we want, although the one in the first figure has nontrivial topology. 

5.2 Testing Whether a Subcomplex Carries Nonbounding 
Cycles of K 

In this subsection, we present the procedure for testing whether a subcomplex 
carries any nonbounding cycle of K. A chain in is a cycle if and only if it 
is a cycle of K. However, solely from Kq, we are not able to tell whether a cycle 
carried by bounds or not in K . Instead, we write the set of cycles carried 
by Kq, Z^°{K), and the set of boundaries of K carried by Kq, B^°{K), as sets 
of linear combinations with certain constraints. Consequently, we are able to 
test whether any cycle carried by Kq is nonbounding in K by comparing the 
dimensions ofZ^°{K) and B^^^K). Theor em 5.5 shows that these dimensions 
can be computed by rank computations of sparse matrices. 

5.2.1 Expressing Z^°{K) and B^°{K) as Sets of Linear Combinations 
with Certain Constrains 

The set of cycles and the set of boundaries of K carried by Kq are 
Zf(X) = Z<i(if) n Q(Xo) and 

respectively. Since Zd{K), Bd{K) and CdiKo) are all vector spaces, Z^°{K) 
and B^°{K) are both vector spaces. Furthermore, since Bd{K) is a subspace 
of Zd{K), B^''{K) is a subspace of Z^°{K). It is not hard to show that the 
subcomplex Kq carries nonbounding cycles of K if and only if the dimensions 
of these two vector spaces are different. 

We want to express these two vector spaces as linear combinations such that 
we can compute their dimensions using algebraic tools. We first express the 
vector spaces, Bd{K) and Zd{K) as sets of linear combinations. Since Bd{K) 
is the column space of dd+i, a boundary of K can be written as the linear 
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combination of column vectors of dd+i- The boundary group can be written as 
the set of Unear combinations 

The cycle group 7.j_[K) is the union of ^d{K) and all the nonbounding cycles 
of K. Suppose we are given a basis for Hd{K), {/ii, /i^^}, together with a 
cycle for each hi, namely, Zi G hi. Elements in hi can be written as Zi + 
dd+i^- Furthermore, elements in 7.d{K) can be written as linear combinations 
of {5i, 6„^_^j , zi, Zfj^}, where the 6j's are the column vectors of dd+i- We 
have 

where Zd = [dd+^Hd] and Hd = [zi, ...,Zf3^]. 

Remark 5.4. In our algorithm, the boundary matrix dd+i is given. We can 
also precompute the matrix Hd by computing an arbitrary basis of Hd{K) and 
representative cycles of classes in this basis. More details will be provided in 
Section 5.3. 

Since Crf(ii'o) is the set of chain vectors whose i-th entry is zero for any 
simplex cr, Kq, we can write Zf°{K) and B^°{K) as elements of Zd{K) and 
Bd{K) whose z-th entries are zero. Consequently, we can write them as linear 
combinations with certain constraints, 

B^°{K) = {9^+17 I 7eZ^<^+\ 5^+17 = OV<7,^ifo} 

Z^°iK) = {Zrf7|7eZ^^+^+''^% = OV<7,^i^o} 

where i9^_,_j and are the i-th rows of the matrices dd+i and Zd, respectively 

5.2.2 Computing Dimensions by Computing Ranks of Sparse Ma- 
trices 

With the following theorem, we can compute the dimensions of these two vector 
spaces Z^° (K) and B^" (K) by matrix rank computations. 

Theorem 5.5. For any matrix A = [^l], dim({^7 | A27 = 0}) = rank(j4) — 
rank(A2) 

Proof. For simplicity, denote a as (rank(^) — rank(A2)). There are rank(^) lin- 
early independent rows in A, rank(j42) linearly independent rows in A2. There- 
fore, there are a rows in Ai that are linearly independent, and not hnearly de- 



Since 



pendent on rows of A2. Choose one such set of rows from Ai, A[ = 

all the rows of A are dependent on rows in A'^ and A2, for any 7 e nullspace(j42), 
A"f is determined by A'lj. 

Proving the theorem is equivalent to showing that A'^'^ can be an arbitrary 
vector in the vector space Zf. It is sufficient to show that for any row of A[, 
ai, the following two statements are both true: 
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1. There exist 70,71 € nullspace(^2), such that aj7o = and 0^71 = 1. 

2. For any 7 € nullspacc(A2), CLil does not Unearly depend on the products 

aj7 for the rest of the rows aj in A'j^. 

For the first statement, choose 70 = G nullspace(A2), which satisfies aj7o = 
0. Now we show that 71 exists by contradiction. Suppose 0^7 = for all 
7 e nullspace(j42). This implies that 

nullspace(A2) C nullspace([_A'2]) 

^ rank([^!,]) < rank(A2). 

This contradicts the linear independence of a, with regard to A2. Therefore, Oj7 
can be either or 1 for 7 e nullspace(A2). In fact, this statement is generally 
true for any row vector a which is linearly independent of the rows in A2. 

For the second statement, again we prove by contradiction. Suppose = 
J2i(^jl) for some rows of A[, the a/s. Define a row vector ao = — ^{aj). 
We have 

ao7 = {a-i - ^{aj))! = 0- 

Since ag is linearly independent of A2, this contradicts to the first statement we 
have just proved. By contradiction, the second statement is true. 

In conclusion, for all 7 e nullspacc(A2), A7 depends on A'^-y, whose range 
space has dimension a. □ 

It is trivial to see that the order of the rows in these matrices does not 
interfere with the correctness of the theorem. Consequently, the matrix A2 can 
be a certain subset of the rows of A, not necessarily the last few rows. Therefore, 
we can compute the dimensions of (K) and (K) as 

dim{B^°{K)) = rank ( a^+i )- rank (a^^f"), and 
dim(Zf°(i^)) = rank(Zd)-raiik(Zf\^°), 

where d^_^^° and Z^^^° are the matrices formed by rows of dd+i and whose 
corresponding simpliccs do not belong to Kq. 

We test whether Kq carries any nonbounding cycle of K by testing whether 
these two dimensions are different. As we know, columns in Hj^ correspond to 
/3d nonbounding cycles whose classes form a homology basis. Therefore, the 
ranks of Z4 and dd+i differ by /3d. Kq carries nonbounding cycles of K if and 
only if 

rank(Zf \^'') - rank(9f+\f °) ^ fia- 

5.2.3 Procedure Contain-Nonbounding-Cycle(iir,i4ro,^^d) 

With all the facts in hand, we are now ready to state the algorithm for testing 
whether a subcomplex carries any nonbounding cycle of K. We use the algo- 
rithm of Wiedemann [20] for the rank computation. See Algorithm 5 for the 
pseudocode. 
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Algorithm 5 Contain-Nonbounding-Cycle(i<',-Fro,-&d) 
Goal: test whether Kq carries nonbounding cycles of K. 
Input: K: the given simplicial complex. 
Kg: the subcomplex. 

Hd'. fid linearly independent nonbounding cycles of K. 
Output: Boolean. 

1: Zd = [dd+uHd] 

2: compute and zf^^" by picking rows of da+i and Zd whose corre- 

sponding simplices do not belong to Kq 
3: if rank(zf\^") -rank(a^\^") then 
4: return true 
5: else 

6: return false 
7: end if 



5.3 The Improved Algorithm 

Next we present the improved version of the procedure Bmin(K). Theorem 5.3 
suggests performing a breadth-first search with a global variable v^in and test- 
ing whether 5^'"*""^ contains nonbounding cycles of K for each p. We use 

the procedure Contain-Nonbounding-Cycle(ii',iV'o,ffd) presented in the previous 
subsection for the testing. See Algorithm 6. 



Algorithm 6 Bmin(K) 

Goal: computing Bmin{h„iin), improved version. 
Input: K: the given simplicial complex. 
Output: Pmin,rrnin-th.e Center and radius of Bmin{hmin)- 
1: precompute Hd 

2: compute a breadth-first ordering of vert (if), [pi, ...,p„j,). 

3: apply the persistent homology algorithm on K with filter function /j 

4: rmin = the birth time of the first essential homology class 

5: Pmin Pi 

6: for i = 2 to riQ do 

7: if Contain-Nonbounding-Cycle(iir,i3p!"'"~^,J?d) then 

9* Pmin ~ Pi 

10: end if 
11: end for 



Precomputing Ha. The improved algorithm requires the computation of the 
matrix Hd, which consists of Pd nonbounding cycles representing elements of a 
basis of Hd{K). For this purpose, any basis is acceptable. We can precompute 
Hd in a similar way to the procedure Loca\ize(i-Cyc\e{pmin-fmirfK) (Algorithm 
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3). More specifically, we perform a column reduction on the boundary matrix 
dd to compute a basis for the cycle group Zd{K). We check elements in this 
basis one by one until we collect (3d, of them forming Hd- For each cycle z in 
this cycle basis, we check whether z is linearly independent of the d-boundaries 
and the nonbounding cycles we have already chosen, i.e. whether 

XB.-Dk{[z,dd+i,H'd]) ^ xB.-nk{[dd+i,H'd]), 

where H'^^ consists of cycles we have already chosen for Hd- More details are 
omitted due to the space limitation 

5.4 Complexity 

We analyze the complexity of the improved algorithm. Denote n and m as the 
cardinalities of K and K' , respectively. As we know, m = 0{j3dn). Similar 
to the analysis of the non-refined algorithm, the improved algorithm Measure- 
IK\\{K) runs the procedures Bmin and Localized-Cycle (3d times, with K' as the 
input. The procedure Localized-Cycle takes O(m'^log^m) time. 

The improved procedure Bmin precomputes Hd once, applies the persistent 
homology algorithm on K' once, and runs the procedure Contain-Nonbounding- 
Cycle 0{n) times. Prccomputing Hd runs the rank computation 0{m) times 
on matrices with 0(m -|- (3d) = 0{m) columns and 0{(3dm) nonzero entries, 
and thus takes 0{m^ log m{(3d + logm)) time. The persistent homology algo- 
rithm takes 0{m?) time. The procedure Contain-Nonbounding-Cycle performs 
rank computations on matrices with 0{m + (3d) = 0{m) columns and 0{(3dm) 
nonzero entries, and thus takes 0{rn?\ogm{(3d + login)) time. Therefore, the 
procedure Bmin takes 0{nr' log m{(3d + log m) -|- + nm? log m{Pd + log m)) = 
0{m^ log m(/3d -t- log m)) time. 

Therefore, the whole improved algorithm takes 0{(3dm^ log m(/3d-|-log m)) = 
0{(3\n^\og^n) time. 

6 Consistency with Existing Works in Low Di- 
mension 

Erickson and Whittlesey [14] measured a 1-dimcnsional homology class using the 
length of its shortest cycle. They computed the optimal homology basis by find- 
ing the set of nonbounding and linearly independent cycles whose lengths have 
the minimal sum. Their algorithm works for 1-dimensional homology classes in 
2-manifolds. 

We prove in Theorem 6.2 that our measure, S{h), is quite close to their 
measure for 1-dimensional homology classes. For ease of exposition, we first 
prove in Lemma 6.1 that by slightly modifying our algorithm of computing the 
localized cycle, we can localize the smallest 1-dimensional homology class, hmini 
with a representative cycle whose length is no more than 2S{h) + 1. We start 
with the modification. 
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Recall that in the procedure Localized-Cycle(p„ii„,r„ii„,_ftr), a localized cycle 

of 

hmin is computed, given the smallest geodesic ball carrying hmini Bmin{hmin)i 
whose center and radius are Pmin and rmin, respectively. More specifically, we 
compute a basis of the cycles carried by Bjnin{hmin) by performing a column 
reduction on 9^, a submatrix of the boundary matrix, 5^. The submatrix is 
constructed by picking columns of dd whose corresponding simplices belong to 

Bmin ( hmin ) • 

A Modification When the relevant dimension d = 1, we modify our algo- 
rithm as follows. Before performing a column reduction on the submatrix d'l, 
we sort its rows and columns in ascending order according to the function value 
/p„ii„ their corresponding 1-simplices, that is, edges. For edges with the same 
function value, we sort them in ascending order according to the minimal func- 
tion value of their vertices. After the sorting, we perform a column reduction 
on d'l to compute a basis for the cycles carried by Bmin{hmin)- The rest is the 
same as the original algorithm. 

Next, we prove that this modification will produce a localized cycle of hmin 
whose length is no greater than 2S{hmin) + 1- 

Lemma 6.1. The modified algorithm localizes the smallest 1-dimensional ho- 
mology class, hmin, with a 1-cycle with no more than 2S{hmin) + 1 edges. 

Proof. For simplicity, we prove the case when K has only one connected com- 
ponent. The general case follows simply. 

Because of the properties of the geodesic distance, we observe the following 
two facts. 

1. For any edge, the function values of its vertices differ in no more than 1. 

2. For each vertex, q ^ Pmim there exists at least one edge with vertices q 
and q', such that 

/p™.„('z') = /p™„(9)-1- 

By lower edges, we denote edges whose two vertices have different function 
values. 

These facts imply that in the modified algorithm, a column is reduced to 

a nonzero column only if its corresponding edge is a lower edge. To see this, 
notice that in the simplex-ordering corresponding to the sorted 9^, for any 
vertex q ^ Pmin, among all the edges adjacent to it, lower edges must appear 
first. During the reduction, q must be paired with one of its lower edges. Since 
Pmin corresponds to the 0-dimensional essential homology class, it is not paired 
by any edge. Therefore, any edge paired with a vertex is a lower edge. Any 
column which is reduced to a nonzero column corresponds to a lower edge. 

The localized cycle we compute, Zmin, is one of the columns of V, corre- 
sponding to zero columns in R, where R = d[V. Let it be the i-th column. 
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corresponding to Ui. It is straightforward to see that only columns correspond- 
ing to lower edges are used to reduce column z of 5^. Consequently, in the 
computed localized cycle, any edge beside Oi is a lower edge, and thus has two 
vertices whose function values differ in one. Since edge i has the function value 
SQimin), Zmin has uo morc than 2S'(ft.„,„) + 1 edges. □ 

For example, in Figure 9, B„iin{hmin) is centered at pi with radius two. 
Using the modified algorithm, edge P3P4 corresponds to the nonbounding cycle. 
Its column is reduced using edges PiP2i P2P3, P4P5 and P1P5, which are all lower 
edges. The computed localized cycle has length 5 — 2S{hmin) + 1- 




Figure 9: Edge P3P4 corresponds to the localized cycle whose length is 

ISihrnin) + 1- 

Based on this Lemma, we prove that our result is close to the result of [14], 
in which size of a 1-dimensional homology class is the length or its shortest 
representative cycle, namely, 

SE{h) = minlength(z), ft, e\Ai{K). 

Theorem 6.2. For a I -dimensional homology class h, 
2S{h) < Seih) <2S{h) + l. 



Proof. Lemma 6.1 shows that there exists a representative cycle of h with no 
more than 2S{h) + 1 edges. Therefore, the shortest representative cycle of h has 
no more than 2S{h) + 1 edges. We have 

Ssih) < 2S{h) + 1. 

Next, we show that 

2Sih) < Ssih). (4) 

Pick the shortest representative cycle zq with length Ssih). Choose any vertex 
p G Zo as the center to build a smallest geodesic ball carrying zq. The radius of 
this ball is SE{h)/2 when Ssih) is even, and (Ssih) — l)/2 when Ssih) is odd. 
Since S{h) is no greater than this radius. Equation (4) is proved. □ 
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This theorem shows that our measure tightly bounds the one by Erickson 
and Whittlesey. Furthermore, we know the localized cycles computed are almost 
the shortest ones. 

Corollciry 6.3. The localized cycle ofh computed by the modified algorithm has 
at most one more edge than the shortest representative cycle of h. 

Remark 6.4. In fact, the algorithm can be further modified to generate exactly 
the same result as the one by Erickson and Whittlesey. We omit this because it 
involves more technical details and does not provide any new insights. 

Remark 6.5. Our modified algorithm can compute the shortest representa- 
tive cycle for 1-dimensional homology classes no matter what dimension K is, 
whereas most of the existing works in low dimension require K to be dimension 
two. 

7 Conclusion 

In this paper, we have defined a size measure of homology classes, found cycles 
localizing these classes, as well as computed an optimal homology basis for the 
homology group. An 0{(3'^n*) brute force algorithm has been presented, which 
measures and localizes the optimal homology basis by applying the persistent 
homology algorithm on the simplicial complex /3n times. Aided by Theorem 5.3 
and 5.5, we have improved the algorithm to 0(/3^n'^ log^ n). Finally, we have 
shown that our result is similar to the existing optimal result in low dimensions. 

Future directions. We intend to extend our work in two directions. 
1. In this paper, a localized cycle zq (z h satisfies the condition 



Can wc localize h with a representative cycle using other size measures? 
Examples of such measures are: 

card(zo) = mincard(2). 



radZ(2;o) = min max distzo{p,q) = J:rimj:adZ{z), 

where card(z) is number of simplices in the cycle z and distzg{q,p) is 
the geodesic distance between p and q within the representative cycle zq. 
We conjecture computing zq satisfying the first two constraints are NP- 
complete. 

2. Can we extend the results if we replace the discrete geodesic distance 
with continuous metric defined on the underlying space of the simplicial 



rad(2;o) = min max dist{p, q) 

P&K gevert(zo) 



min radf^). 
zeh 



diam(2;o) 




complex? 
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